Fractal and Statistical Properties of Large Compact Polymers: a Computational Study 
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We propose a novel combinatorial algorithm for efficient generation of Hamiltonian walks and 
cycles on a cubic lattice, modeling the conformations of lattice toy proteins. Through extensive 
tests on small lattices (allowing complete enumeration of Hamiltonian paths), we establish that 
the new algorithm, although not perfect, is a significant improvement over the earlier approach by 
Ramakrishnan et. al. Q, as it generates the sample of conformations with dramatically reduced 
statistical bias. Using this method, we examine the fractal properties of typical compact conforma- 
tions. In accordance with Flory theorem celebrated in polymer physics, chain pieces are found to 
follow Gaussian statistics on the scale smaller than the globule size. Cross-over to this Gaussian 
regime is found to happen at the scales which are numerically somewhat larger than previously 
believed. We further used Alexander and Vassiliev degrees 2 and 3 topological invariants to identify 
the trivial knots among the Hamiltonian loops. We found that the probability of being knotted 
increases with loop length much faster than it was previously thought, and that chain pieces are 
consistently more compact than Gaussian if the global loop topology is that of a trivial knot. 



I. INTRODUCTION 

The dominant mood among the protein folding experts 
these days seems to suggest that we are rapidly approach- 
ing the day when experiments and theory - or, rather, 
simulations - will be ready for direct quantitative com- 
parison. New generation experiments, including single 
molecule ones |^ |^ provide the long awaited insights 
into the folding paths. New proteins are discovered or 
invented exclusively with the goal to see their folding on 
the time scale more accessible to simulations. In the com- 
plementary drive, modern computer simulations 
particularly those employing so-called distributed com- 
puting Q, not only consider explicitly all atoms (al- 
though no explicit water), but also rapidly improve in 
terms of the ways to treat forces involved j^, 0, . 
The impressive episode of a theoretical prediction pl3 | 
verified by the experiment |3 is celebrated as the 
sign of approaching new level of integration between the- 
ory and experiments. 

In our opinion, all these shining achievements only 
highlight once again how badly we need a better insight 
into the simple fundamentals of folding. Just as the de- 
coding of genomes does not cancel, but strengthens the 
pressing need of orders of magnitude higher throughput 
reading systems, in the same way deeper understanding 
of the underlying simple physical principles behind pro- 
tein folding remains one of the most needed pieces of the 
puzzle. With this point in mind, in this work we try to 
address deeper the properties of the simplest caricature 
proteins, namely, lattice ones. 

Of course, in our work with simple toy models we 
should keep an eye on the progress of more elaborate 
studies. What do they teach us? In the opinion of 
the present authors, what stands out as a common les- 
son in all computational studies of protein folding is the 
central importance of the interplay between two triv- 



ial facts - the first is that proteins are polymers, and 
the second is that they are compact (globular) polymers. 
Very hi ghly non-trivial geometry comes with these facts 
El ElM III IS EH. This opinion was also expHcitly 
formulated in the recent News and Views psj . 

What do we know about compact polymer conforma- 
tions? Protein data bank contains large and rapidly 
growing collection of conformations. Should there be 
any general principle behind these conformations? Many 
authors are looking for such principles, either biological 
(selection-driven), or physical, geometrical, etc. Not even 
starting to discuss the existing theories, their advantages 
and disadvantages, we would like to point out that such 
discussion remains premature as long as properties of 
random compact conformations are not understood well. 
Indeed, having no insight into the majority of arbitrary 
conformations, we cannot judge how non-random are the 
conformations in protein data bank. For instance, there 
are relatively few knots in native proteins (22il23 . l2l . l25| : 
is it because unknotted conformations are somehow bio- 
logically selected, or are they physically preferable for, 
e.g., folding - or alternatively, maybe, what seems to 
be "few" for us is, in fact, statistically expected number 
of knots in compact conformations of the given length? 
Currently, we cannot answer this. 

The theory of random compact conformations is well 
developed on the mean field level (see, e.g., in the book 
[26j|). This is the theory of homopolymer globules, be- 
cause they are entropically dominated by the most typ- 
ical conformations. Major conclusion of the mean field 
theory is that chain segments inside the globule follow 
Gaussian statistics, and do not exhibit any signs of or- 
der. This conclusion is in sharp contradiction with the 
statements in the literature [23, IH that compact- 
ness of the conformation may favor elements of secondary 
structures, such as a-helices and /3-pins. 

Computationally, the problem of compact conforma- 
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tions is closely related to that of Hamiltonian walks on 
the graphs. We remind the reader that the concept of a 
Hamiltonian walk was introduced by Hamilton in connec- 
tion with famous Euler problem of Konigsberg bridges: 
the task was to find the Sunday promenade passing ev- 
ery one of the seven bridges, never returning to the al- 
ready visited place. In general, Hamiltonian walk on an 
arbitrary graph can be defined as a walk which visits ev- 
ery site on the graph once and only once. If our graph 
is, say, £ x m x n piece of the cubic lattice in 3D, then 
Hamiltonian walk on such graph is the same as maximally 
compact conformation of the polymer filling i x m x n 
domain. 

Enumeration of Hamiltonian walks on graphs is well 
known problem in combinatorics. Of course, the best 
possible statistics is achieved by exhaustive enumeration 
of all Hamiltonian walks. This is possible for rather short 
polymer chains only: for the chains with 27 monomers 
filling 3 X 3 X 3 of the cubic lattice , and also for 36- 
and 48-mers, filling 3x3x4 and 3x4x4 segments, re- 
spectively |31j. Obviously, these chains are far too short 
to address statistics and fractal structure of the typical 
conformation. 

Short of exhaustive enumeration, other methods to 
generate larger compact conformations have been sug- 
gested. The most straightforward Monte Carlo chain 
growth methods [s^l are totally inefficient for long com- 
pact chains, because of catastrophic explosion of rejected 
looped conformations. Transfer matrix approach put for- 
ward by |U 0, is very efficient for the chains filling 
an elongated domain i x m x n, where one of the dimen- 
sions, say n, may be arbitrarily large. Unfortunately, to 
remain within computational tractability, two other di- 
mensions, £ and m, must be small, not greater than 2 or 
3. An alternative approach, suggested in Q, is free of 
this limitation. It employs combinatorial techniques of 
two-matching and patching of bipartite graphs. Unfor- 
tunately, we found that this method generates conforma- 
tions in a heavily biased way. 

The objective of our work is three-fold. First, we re- 
port the improvements to the algorithm by Ramakrish- 
nan et al j| . We must mention at once that even the im- 
proved method is not free of biases; however, it is signif- 
icantly better in this respect than the original approach 
0. Second, we investigate the properties of the gen- 
erated compact conformations (Hamiltonian walks) and 
cycles against the polymer length. The largest walks gen- 
erated have the size 22 x 22 x 22. Third, we examine the 
topology of maximally compact closed loops, including 
the loop length dependence of the trivial knot probabil- 
ity, as well as the local fractal structure of the typical 
conformation for both averaged loop and the loop which 
is trivial as a knot. 

The article is organized as follows. The proposed new 
algorithm is formulated in details in the next section ITTl 
The results of the implementation of this algorithm are 
presented in section lHll The topological properties of the 
compact knots are considered in the section Hvl At the 



end, we discuss the conclusions from our study in section 

m 

II. METHODS 
A. Construction of the lattice graph 

We performed our simulations on L x L x L cubic lat- 
tices with L = 2, 3, . . . , 22, but our algorithm applies for 
any finite regular bipartite graph. The graph is called 
bipartite if two colors suffice to paint it in such a way 
that every two neighboring vertexes have different col- 
ors. Chess board is a good example of a bipartite graph; 
three vertices connected as a triangle is an example of 
a graph which is not bipartite. We call the graph, or 
lattice, even or odd if the total number of vertexes, N, 
and, therefore, the length of Hamiltonian walk, is even 
or odd, respectively. Obviously, L x L x L cubic lattice 
is the bipartite graph, with N — L^; it is even or odd for 
even or odd L, respectively. 

The following very simple theorem can be established 
regarding the Hamiltonian walks on bipartite graphs. If 
a bipartite graph is colored, say, using black and white 
colors, then the walks on this graph necessarily step from 
black to white or vice versa. Therefore, every Hamilto- 
nian walk on an even lattice starts and ends on differ- 
ent colors, while on the odd lattice its ends occupy the 
vertices of the same color. Moreover, on the odd lat- 
tice one of the colors can be called major, because there 
are more sites of one color than the other {{N + l)/2 
vs. {N — l)/2). We shall call this simple statement the 
chess board theorem. One of the conclusions of the 
chess board theorem is that the Hamiltonian cycles are 
impossible on the odd lattices, because every cycle on 
the bipartite graph must contain equal number of sites 
of both colors. 

From the discussion above, it may seem that genera- 
tion of Hamiltonian walks on odd and even lattices, and 
generation of Hamiltonian cycles on even lattices, are 
three very different problems which should be treated 
separately. In fact, they can all be reduced to one an- 
other by the trick proposed in the article 0. Let us 
introduce extended graph by adding some out- of -lattice 
vertices using the following rules: 

• In case of even lattice, we add two out-of-lattice 
vertices of different colors (see Fig. ^). We con- 
nect them to each other, and each of them - to all 
the lattice vertices of the opposite color. 

• In case of odd lattice, we add only one out-of-lattice 
vertex, which is colored minor color and connected 
to all major color "real" vertices (Figure^). 

Constructed this way, extended lattices are always even. 
Therefore, all we have to do is to generate Hamiltonian 
cycles on the even lattices. As soon as that problem 
is addressed, we can generate Hamiltonian cycle on the 
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FIG. 1: The construction of the lattice graphs for generation 
of a) Hamiltonian walk on even lattice; b) on odd lattice; c) 
Hamiltonian cycle. The walks are drawn as solid lines and 
the edges of the lattice graphs as dash lines. 



extended lattice and obtain open Hamiltonian walk by 
just removing the out-of-lattice vertices. 



B. The algorithm 

The original combinatorial algorithm by Ramakrish- 
nan et al 1] consists of two steps. First, it generates some 
configuration of sub-cycles and sub-chains with dead ends 
on the lattice by means of two-matching procedure; sec- 
ond, it transforms these pieces into a single Hamiltonian 



walk using another procedure called patching. The main 
novelty of our algorithm is that the formation of sub- 
cycles and sub-chains is forbidden, and we always gener- 
ate the single Hamiltonian cycle on the extended lattice 
graph. Thus, patching stage becomes unnecessary. We 
explain in the Appendix ^ why the formation of small 
loops and sub-chains in the original method Q biases 
sampling of the Hamiltonian walks. 

The algorithm works by placing links on the lattice 
graph. At the beginning, the lattice graph contains no 
links. Then, algorithm starts placing links randomly, 
connecting randomly chosen neighboring vertices (Figure 
Every time a new link is chosen, we check whether it 
forms an unwanted small subtitle or a dead end (Figure 
Eb), and the link is rejected if this happens. (The only 
little exclusion from the general rule is required for an 
even lattice, where the first link is always drawn between 
the out-of-lattice vertices, and this link is never removed 
on the later steps of the algorithm.) The algorithm stops 
when all vertices of the graph are saturated by two links 
each, and the links form a Hamiltonian cycle. The obvi- 
ous difficulty is that randomly chosen vertex frequently 
cannot be linked to its randomly chosen neighbors, be- 
cause the latter is already saturated (Figure This is 
the situation in which two-matching is applied. 

Two-matching starts from picking up a vertex, P, 
which is currently either not connected, or has only one 
incoming link. Then, its random neighbor Q is chosen as 
an opposite end of the new link. If Q belongs to some 
linear sub-chain, we peak up randomly one of the links 
incoming to it and follow this direction along the sub- 
chain. When the sub-chain terminus is found, it is in- 
vestigated for the possibility to be connected with one 
of its neighbors. For each vertex, all the non-saturated 
neighbors ending the sub-chain are placed on the special 
list. The neighbors are not included in the list if linking 
with them leads to the formation of sub-cycles or dead 
ends (Figure Then, a random vertex from the list 
(of course, if the list is not empty) is chosen, and the new 
link is drawn (Figure Et) . The growth of the sub-chain 
is followed by the switching of the links incident on Q. 
The link such as QS (see Figure |2f; the link opposite to 
the one pointing to the end just elongated) is removed 
and the new link PQ is drawn, subject to the following 
two conditions: i) the vertex P is still unsaturated after 
the elongation of the sub-chain; ii) linking the vertices P 
and Q does not produce subtitle or dead end. Depending 
on the success of two processes contributing to the two- 
matching, the number of links on the graph increases by 
one, remains the same, or decreases by one. In our sim- 
ulations, the latter case was rare and did not slow the 
process too much. 

The new links are placed on the graph until finally a 
single cycle passing once and only once through every 
vertex of the graph (including the out-of-lattice ones) is 
formed. 
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FIG. 2: Schematic representation of the application of the 
algorithm. For simplicity, steps of the algorithm are shown in 
two dimensions. See text for further explanations. 



C. Algorithm performance test 



We implemented the algorithm described just above to 
generate linear polymer chains up to the size 12x12x12 
on even lattices, up to 15 x 15 x 15 on odd lattices, 
and the compact cycles of the sizes up to 22 x 22 x 22. 
On the lattices larger than mentioned this algorithm be- 
comes exponentially slow, however, for the investigated 
lattices, we found the CPU time necessary to generate 
one chain conformation demonstrates power law depen- 
dence on the length of the walk, N. The effectiveness of 
our algorithm executed on the Pentium III 1.1 GHz PC is 
demonstrated in Figure O The run time scales approxi- 
mately as N'^'^ for both linear polymers and cycles for the 
moderate chain lengths. This is slower than performance 
reported in P| for the original algorithm (~ N^-^). This 
is the price we must pay to ensure fair sampling. Still, 
our algorithm allows to generate compact polymer chains 
within the length range of several orders of magnitude. 
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FIG. 3: Performance of the algorithm for generation of Hamil- 
tonian walks and cycles on cubic lattices. The results for even 
walks are shown as triangles, for odd walks as diamonds, for 
cycles as squares. 



D. Topological aspects 

There exists abundant literature on computational 
studies of the knot composition of non-compact closed 
chains, starting with the pio neering work of Vologodskii 
et al 111 m IS E3- These studies are mostly 

motivated by the intent to model closed circular DNA. 
There are much fewer studies made with compact chains 
|4^ E3l> although the question of knots in proteins is 
widely considered a puzzle |2ll23L l2ll25l| . 

We should particularly emphasize the work by Mans- 
field ^3 > where he addressed knots in Hamiltonian cycles 
on the cubic lattice. What we add here to his analysis is 
we pull it to significantly longer loops, which turns out 
to be essential, and we also study the statistics of the 
sub-chains in the loop whose overall global topology is 
fixed. 

As in all previous works, we applied the theory of knot 
invariants to determine the knot-type of a given confor- 
mation. Knot invariants are mathematical objects that 
serve as a 'signature' of the knot-type. As a signature, 
knot invariants are, unfortunately, not unique to a given 
knot. The use of the appropriate types and number of 
knot invariants yields only a good likelihood that the knot 
has been identified correctly. This likelihood is high, in 
certain cases unity, if the number of crossings in the knot 
projection could be reduced to a sufficiently small num- 
ber. The difficulty we have to face here is that compact 
conformations have typically very large numbers of cross- 
ings on the projection. 

In this work, we calculated for a knot K three invari- 
ants - the Alexander polynomial (A(i)i<-) evaluated at a 
certain value of t, (A(— 1)^^), the Vassiliev invariant of 
degree two {v2{K)), and the Vassiliev invariant of degree 
three (113 (if)) - as was also done in '4]1|. A connection 
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is made between a conformation and its knot-type if the 
invariants calculated from the projection of the confor- 
mation coincide with the invariants associated with the 
knot- type. 

In order to illustrate the necessity of topological invari- 
ants in identifying even the simplest knots, including the 
trivial knot (which is an unknot) we show Figure 21 In 
fact, the loop shown in this figure is a trefoil knot, but it 
is virtually impossible to realize this fact by eye. 
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FIG. 4: Projected nodes and links of a 6 x 6 x 6 conformation. 
The knot formed is a trefoil. 



Thus, after a compact conformation has been gener- 
ated, the procedure for determining its knot-type in- 
volves the following steps: (1) Generate Plane Projec- 
tion; (2) Preprocess Projection; (3) Compute Knot In- 
variants from Projection; (4) Match Conformation with 
Knot-type using Tabled 



1. Preprocessing Projection 

The goal of preprocessing the projection is to sim- 
plify the knot by reducing the number of intersections 
or crossings of the projected links. The intuitive local 
'moves' that can accomplish this simplification are called 
Reidemeister moves (see, for instance, 44]). Given the 
very complicated nature of typical compact conforma- 
tions, we resort to combinations of Reidemeister moves, 
compounded, or 'macro', as discussed in :45j. 

For large conformations, a further simplification can 
be achieved by first 'inflating' the conformation before 
taking the projection. A less dense conformation leads 
to a significant reduction of crossings. In fact, this was 
done for 14 x 14 x 14 conformations before the Vassiliev 
invariants were evaluated. 



TABLE L Values of knot invariants for a few knots. 



KNOT 


Alexander, 


Vassiliev, 


Vassiliev, 


CHIRAL? 




|A(-1)k| 


V2{K) 


\MK)\ 




Oi 


1 








NO 


(Trivial) 










3i 


3 


1 


1 


YES 


4i 


5 


-1 





NO 


5i 


5 


3 


5 


YES 


52 


7 


2 


3 


YES 



2. Computing Knot Invariants 

An algorithm for computing the Alexander polynomial 
A{t)K is presented clearly in [s^ and will not be dis- 
cussed any further here. Suffice it to say that the algo- 
rithm requires the construction of an 'Alexander' matrix 
from the knot projection, with dimension equal to the 
number of crossings. The determinant is subsequently 
calculated after setting t to —1 to obtain the single num- 
ber A(-l)i^. 

The geometrical origin of this invariant may be traced 
to 'linking' numbers calculated from a set of closed 
curves. These closed curves are associated with a 'Seifert 
surface' whose boundary is the knot |4l |. 

The calculations for the Vassiliev invariants 
(v2(K),v:^(K)) are presented as diagrammatic for- 
mulas in j4y|. These formulas operate on a Gauss 
diagram, or equivalently on a Gauss code for a knot 
K. The set of Vassiliev invariants may be considered as 
a generalization of the Gauss integral formula for the 
linking number. 

As mentioned earlier, it is possible for two distinct 
knots to have the same set of knot invariants. However, 
we expect that the false identification of a knot would be 
rare. For instance, the set of three knot invariants for the 
trivial knot is distinct from those of (prime) knots with 
10 minimum crossings or fewer (249 knots in all) in their 
projection. 



III. RESULTS: COMPACT CHAINS 
A. Statistics for the small lattices 

As a first test of our algorithm, we compare the statis- 
tics of generated random samples with the results of ex- 
haustive enumeration for 2x2x2 and 3x3x3 cubic 
lattices. 

For the 2x2x2 lattice the task is easy, because the 
complete list consists of only 3 symmetrically unrelated 
Hamiltonian walks. These walks are shown in the Figure 
The unbiased algorithm should generate each of these 
3 conformations with probabilities 1/3. We generated 
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TABLE II: The average fractions of different 2x2x2 confor- 
mations in generated samples obtained with two algorithms. 





Conformation 


Algorithm 


1 


2 


3 


Ramakrishnan et al \1] 


0.278 


0.358 


0.364 


present 


0.328 


0.328 


0.344 



samples of 100000 walks using our algorithm and using 
the original algorithm of Ramakrishnan et al 0. The 
average fractions of different walks in generated samples 
obtained with both algorithms are shown in Table ITU 
Clearly, the algorithm jl| fails this test; the reasons of its 
failure are explained in the Appendix 1X1 




FIG. 5: There are three symmetrically unrelated conforma- 
tions possible on 2 X 2 X 2 cubic lattice. 

For the 3x3x3 lattice a little more elaborate procedure 
is necessary. Suppose, there are some M conformations 
(for instance, M = 103346 for 3 x 3 x 3 lattice [13), 
and suppose we repeatedly apply one and the same al- 
gorithm to generate a number K of Hamiltonian walks. 
Apart from glitches with the random number generators, 
subsequent applications of the algorithm are statistically 
independent. Therefore, for every conformation i there 
is the occurrence probability pi. For the unbiased algo- 
rithm. Pi = 1/M; in general, = pi — 1/M measures the 
bias. To examine this bias, we compute the distribution 
TOfc - for every number of appearances fc, is the num- 
ber of conformations that appeared k times in K trials. 
Obviously, nik is normalized such that X^feLo "^fe ~ 
Since appearances of every particular conformation are 
binomially distributed, we have 



M 



rrik 



\K-k 



K\ 



i=l 



k\{K-k)\ 



(1) 



where the summation runs over all conformations. From 
here, it is not difficult to find that, first of all, the average 
(over all conformations) appearance number is A: = K jM , 
it is independent of a bias. The information about the 
bias is contained in further moments of the distribution. 
Specifically, we consider the further cumulants of the dis- 
tribution of e,-: variance 



1 



(2) 



skewness 

\c I cum - 

and kurtosis 

cum ■ 



1 

^3 



ik-kf-i{k-kf + 2^ 



(3) 



1 

X4 



{k-k) -6{k-k) + 



+ n{k - k)^ - 3{k - k)^ -6^ 



(4) 



where averaged (over all conformations) powers of e are 
defined according to 



1 *^ 



(5) 



0j04 



Oj03 




2SM 250 



FIG. 6: The computed distributions of not symmetrically re- 
lated conformations on 3 x 3 x 3 lattice by the frequency of 
generation obtained by our method (columns) and method of 
article U (grey line) compared with the distribution expected 
for the unbiased algorithm (black line). Here, k is the num- 
ber of times a conformation appeared in K = 10000000 trials, 
while mfc, for every k, is the number of conformations which 
appeared k times. The number of different Hamiltonian walks 
on 3 x 3 x 3 lattice is M = 103346. 

We generated two samples oi K = 10000000 Hamil- 
tonian walks by means of our algorithm and the one of 
the article Q and compared the appearance of different 
Hamiltonian walks in these samples. The obtained dis- 
tributions TOfc for both algorithms are shown in Fig. 
The distribution for the unbiased e = case (when it 
is simply a Gaussian with the mean K/M and variance 
also K/M) is also presented in the same Figure. The pa- 
rameters of the computed distributions are summarized 
in the Table iHll 
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TABLE III: The parameters of computed distributions of con- 
formations on 3 X 3 X 3 lattice obtained with two algorithms. 



Algorithm 


l/A/ 


l/M 


l/A/ 


Ramakrislman 
et al [1] 


0.34 


0.34 


0.35 


present 


0.12 


0.09 


0.21 



As the data indicate, our algorithm produces the dis- 
tribution, which is close to the expected unbiased result. 
The distribution shape is very closely Gaussian, which 
means the bias is weak. At the same time, the algorithm 
of the article Q showed poor results and produced the 
distribution, which is essentially skewed. This demon- 
strates strong biases of that method. 

A not so good news about our algorithm is that the 
width of the distribution is still larger than expected for 
unbiased sampling. Given the width of the distribution 
we can estimate the bias from formula (0), e = 1.2 x 10~®. 
This signals certain bias, about 10%, in the generation 
of Hamiltonian walks. However, the bias is small, and 
certainly much smaller than for the previous algorithm. 
In what follows, we shall examine the statistics of Hamil- 
tonian walks generated by our algorithm and neglecting 
its bias. 



B. Statistics of segments and loops in generated 
walks 

By the statistics of segments we understand the fol- 
lowing. Imagine a long polymer compressed in a very 
compact state, and suppose a part of the chain, some 
t monomers long, is labeled. For instance, it may be 
deuterated. Then, we can study the conformation of the 
labeled segment. Is it collapsed, with the overall size 
scaling as l^^^l Is it extended, with end-to-end distance 
scaling as Does it exhibit any signs of regularity, such 
as helical structure of some sort? Or is it purely random, 
yielding Gaussian statistics with the size scaling as i^/'^l 
This is the question we want to address here. 

To begin with, let us remind the major conclusions of 
the mean field theory (see, e.g., review in the book [2^). 
This theory suggests that labeled chain segment behaves 
similarly to the labeled chain in a macroscopic polymer 
melt or concentrated solution of different chains. There- 
fore, it obeys Flory theorem ^3.EHE3- To appreciate 
the highly non-trivial statement of the Flory theorem, 
one has to realize first of all that either labeled chain 
in the concentrated melt, or labeled ^-segment in the 
globule, is subject to the volume exclusion constraint: 
trivially, other monomers cannot penetrate the volume 
occupied by any given monomer. As it is well known 
in polymer physics, volume exclusion leads to polymer 



swelling, with significant correlations between monomers, 
and with chain size scaling v « 0.588 ~ 3/5. It is 
not difficult to realize that the presence of surrounding 
chains in the melt, or surrounding parts of the same chain 
in the globule, leads to some effective attraction between 
labeled monomers. Flory theorem says that this attrac- 
tion exactly compensates the excluded volume effect. In 
other words, surrounding polymer medium shields ex- 
cluded volume effect, leaving labeled chain with Gaussian 
statistics and the size proportional to This screen- 
ing is sometimes called Edwards screening, it is similar 
to Debye screening in plasma. 

What is the range of t in which Gaussian scaling is 
expected? Of course, I must be larger than the effective 
Kuhn segment - which is equal to unity for the lattice 
model. Another restriction, relevant for the globule and 
not for the melt, is that labeled segment as a whole should 
be away from globule boundaries, or surfaces. Assuming 
globule size about N^/^ for the globule of density one 
and the chain of N monomers, we arrive at the condition 
£1/2 <iVi/3, or l<£<iV2/3. 

Although this is not very important for the present 
study, we would like to digress to inform the reader that 
even within the mean field level, there are delicate correc- 
tions to the simple picture as described above. To under- 
stand this, one should think of an auxiliary problem of a 
Gaussian polymer without excluded volume confined in 
a cavity with impermeable walls. Under such conditions, 
chain adopts a conformation with density peaked at the 
middle of the cavit y an d with density almost vanishing 
at the cavity walls |2g. The contrast between this the- 
oretical model and the real globule with fiat distributed 
internal density suggests that self-consistent field acting 
inside the globule not only compresses the chain, acting 
like a cavity, but also pulls the monomers from glob- 
ule center to the periphery. This pull slightly perturbs 
Gaussian statistics of the sub-chains, particularly those 
located nearby the globule boundary. Computationally, 
we shall not look into this delicate effect in our present 
study. 

Thus, we compute the mean square end-to-end dis- 
tances of the segments of Hamiltonian walks: 

where £ is the contour length of the segment of the walk 
(in units of steps), K is the total number of walks in the 
sample, N is the length of the walk, r^"'"' is the position 
vector of the vertex visited i-th in the j-th walk. 

The results for the samples of Hamiltonian walks of 
different lengths are presented in Figure|7| In good agree- 
ment with mean field theory, on the scales smaller than 
Af2/3 ^Y^Q walks obey Flory theorem and the aver- 
age distance between the segment ends scales such that 
(i?2(£)) ~ £. We would like to note here that Flory theo- 
rem does not tell us anything about the prefactor of this 
scaling. Fitting on the statistics of the lattice polymer 
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FIG. 7: Mean square end-to-end distance of the segments of 
Hamiltonian walks vs. the lengths of segments is shown for 
the lattices of different sizes. The curves for linear walks and 
cycles on the lattices 4x4x4, 8x8x8 and 12 x 12 x 12 
coincide. 



cycles of the size 22 x 22 x 22 suggests the prefactor to be 
equal ~ 1.5 > 1. For the polymer chain without excluded 
volume it is exactly equal to 1. Therefore, the excluded 
volume effectively increases the Kuhn segment length. 

On the scales £ ^ N, the walk starts feeling the con- 
finement by the lattice borders, and {B?{t)) levels off. 

Another measure of the agreement between statistics 
of Hamiltonian walks and Flory theorem is the looping 
probability. The Figure shows how often the loops 
of different contour lengths appear in the Hamiltonian 
walks. Here, we say that the walk makes a loop of the 
length ^, if after visiting site with the coordinates r,; it 
visits one of this site neighbors in exactly £ steps. What 
does the mean field theory have to say about these loops? 

As we saw for the statistics of end-to-end distances, on 
the scales i < A^^/^, the Hamiltonian walks are Gaussian. 
Then, the probability distribution of their end-to-end 



vectors R must obey Gaussian law ~ I exp 

For the loop, i? = 1. Therefore, average number of loops 
of the contour length (. should decay as exp(— 
with growing L That is why the number of loops on the 
vertical axis of the Figure is weighted by the factor 
of exp(— 1/£). We can express surprise that power law 
comes so slowly and appears only at large A^ (see 
the table on the inset to Fig. 

We can also check cross-over value of I and how it de- 
pends on A^. Vertical lines on the Figure |HJ^ mark the 
characteristic segment lengths at which the cross-over 
takes place for the polymer chains of different length. 
And Figure shows the dependency of these threshold 
values on the polymer length A^. It is clearly seen that £ 
scales as N'^/'^. 

On the larger scales, i > N^^^, the probability to find 
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FIG. 8; (a)The average number of loops of various contour 
length in generated Hamiltonian walks on the lattices of dif- 
ferent size. Vertical lines display the cross-over values of £ at 
which looping probability saturates. Horizontal dash line cor- 
responds to the predicted saturation level for the 22 x 22 x 22 
walks, (b) The dependence of the cross-over value of £ on the 
polymer length. 



the loop of length £ saturates and becomes practically 
independent of £. To estimate its constant value, we can 
resort to the following argument. The random walk of a 
length greater than A^^/^ hits the borders of the lattice. 
The end of the longer walk may be found in any lattice 
site with nearly equal probability 1/A^. Since the loop 
formation condition is met by {z) of sites neighboring 
to the loop starting site, the loop probability is about 
^ (z) / N. Here, (z) is the mean coordination number 
of the lattice (which takes into account that the sites on 
the surface have fewer neighbors than those in the bulk) . 
At the same time, there are A^ — A^^/^ ss A^ such loops 
possible, therefore, there must be about {z) loops of each 
length found in every walk. Indeed, the horizontal dash 
line on the FigurejSjL corresponding to (z) of the compact 
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walk of the size 22 x 22 x 22 reasonably estimates the 
number of long loops in the globule of this size. 

The results presented in Figure |H1 are in full agreement 
with the theory, both in terms of the power law decay 
^£-3/2-^ at moderate ^, the range of the cross-over (£ ~ 
N'^/^), and the constant levels at large I {{z))- 



C. Correlation between ends in Hamiltonian walks 

It is an interesting question in the theory of polymer 
globules, whether the ends of the polymer chain are ef- 
fectively independent of each other in terms of their po- 
sitions inside globules, or they repel (attract) due to the 
conditions of the connectedness and compactness of the 
chain. If the end of the chain is located in the bulk of the 
globule, there may be entropic cost associated with the 
rearrangement of the parts of the chain surrounding it 
due to necessity to keep the compactness of the globule. 
This local rearrangement of the polymer chain may affect 
the probability of the other end to locate in the vicinity. 
Effectively, this may lead either to the attraction, or to 
the repulsion of the ends of the chain. Theoretically, this 
issue remains currently unclear js^l . 

To check on the existence of such effective interaction 
between chain ends, we calculate the end-end correla- 
tion coefficient for the samples of generated Hamiltonian 
walks. This quantity is defined via the formula 



(7) 



where 2:1 and X2 are the a;-coordinates of the two chain 
ends, (...) means averaging over all sampled walks. For 
simplicity, we place coordinate system origin in the cen- 
ter of the cube, such that (xi) = (^2) = 0. Due to the 
symmetry, correlations coefficients for y and z coordi- 
nates are the same as for x, while all the non diagonal 
elements (such as {xiy2) etc.) vanish. 

The results obtained from the simulations on the lat- 
tices of the size L = 2,3,. ..,10 are presented in Figure 
El along with the data of the exhaustive enumeration for 
the 2x2x2 and 3x3x3 lattices and the exact re- 
sults for the disconnected ends model (which, due to the 
chess board theorem, is only meaningful for odd lattices; 
for even lattices, two ends must be on the oppositely 
colored sites, and, therefore, are not correlated at all). 
The results for the small lattices are very close to exact 
(whereas the original algorithm T| produces significant 
systematic errors). This is another good suggestion that 
our algorithm has weaker bias than that of the work Q . 

The fact that correlation coefficient is negative indi- 
cates that there is some effective repulsion between the 
chain ends. This effect decreases and supposedly goes 
to zero with increasing of lattice size. Moreover, correla- 
tion between ends very rapidly approaches correlation be- 
tween disconnected points subject only to excluded vol- 
ume condition. This observation suggests that even the 
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FIG. 9: Mean diagonal end-end correlation coefRcients for the 
Hamiltonian walks on the lattices of different sizes. The data 
of exact calculation for 2x2x2 and 3x3x3 lattices are shown 
as The data of exact calculation of correlation coefficients 
for the random pairs of dots on the odd lattices obeying the 
excluded volume condition and the chess board theorem are 
shown for the comparison as x . The results obtained from 
generation of the walks with algorithm of the work are 
shown as the dashed line. 



small repulsive correlation between chain ends is mostly 
due to the benign excluded volume effect of the terminal 
monomers, and chain connectivity provides only faint, al- 
though also repulsive, contribution (probably mostly due 
to excluded volume of monomers next to the terminal 
ones). 



IV. RESULTS: COMPACT LOOPS AND THEIR 
KNOTS 

A. Average Crossing Number 

Figure [TUI displays the average number of crossings in 
the plane projection of a conformation, together with the 
reduced number and mathematical prediction, for the 
range of sizes L = 4 to L = 20. The crossing numbers 
are plotted against the length (number of monomers) 
iV = h^. 

The prediction 



C 



3(L 
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for the average crossing number of an L x L x L con- 
formation follows from the assumption that every seg- 
ment upon projection in some 'vertical' direction pro- 
duces crossings with all segments above and below it 
inside the cylinder of the cross-section unity. In this 
sense, the result for the average crossing number is triv- 
ial. However, it is interesting to note that for large L, 
the expression for the average crossing number scales as 
C = = iV 3 , which is reminiscent of a 'four-thirds 
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power law' relatin g cr ossinK number and 'rope length' 
for tight knots [S^, IS^, |53 ■ This suggests that this four- 
thirds power law does not reflect on any intimate proper- 
ties of tight knots, except their overall space filling char- 
acter. 

From the average crossing number, one could get an 
idea of how the amount of computational resources in- 
volved in the calculation of a knot invariant, say Alexan- 
der, scales with conformation size. The Alexander invari- 
ant entails computation of the determinant of a C x C 
matrix. Naively using Gaussian elimination, computa- 
tion time would roughly scale as = N^. 
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FIG. 10: Average crossing numbers in the knot projection, 
before and after preprocessing with Reidemeister moves, to- 
gether with mathematical prediction. These were plotted 
against the size (length A'^ = L"^) of the conformation, from 
L = 4 to L = 20. 



B. Knot Probabilities 

Figure E] displays our results for the fraction of con- 
formations (of a given size N — L^) which are unknotted. 
For each L from 4 to 12, 10^ conformations were gener- 
ated. The last data point for the largest conformation 
we were able to analyze (14 x 14 x 14) represents 4 trivial 
knots out of 350000 conformations. 

Since the total number of conformations of the length 
N grows exponentially with N , it is not a surprise that 
the probability of a trivial knot decays exponentially with 
N [m. I55I I . Accordingly, computational data on trivial 
knot probability are customary fit to exponential. In our 
case, the exponential fit to the (last three) data points 
yielded an estimate for the unknotting probability as a 
function of iV, ~ exp(— iV/196), as shown in Figure ITTI 

Previously, there were some works measuring knotting 
probabilities for lattice polygons in confined geometries 
|4^^]. In particular, Mansfield (4^] has examined knots 
of compact Hamiltonian cycles on a lattice - the same 
problem we consider here. However, these authors use 
one invariant, the Alexander polynomial, in their com- 
putations (although Mansfield |43| evaluated Alexander 
polynomial at 10 different values of t). This is under- 



standable, as the Vassiliev invariants are a relatively re- 
cent discovery '4^ , in particular the invention of explicit 
and computationally implementable formulas for their 
evaluation. Moreover, we were able to analyze larger con- 
formations: the work examined N < 1000, while we 
consider N up to 14"^ = 2744, almost three times larger. 

Mansfield's fit to his results (exp(— A^/270)) is shown 
in the thinner, dotted line in Figure ITTI Importantly, our 
results for N < 1000 agree well with both the results 
and the fit by Mansfield However, examination of 

larger N leads us to revise the estimation of characteristic 
length iVo in exp{-N/No) from Nq « 270 to iVo « 196. 
Moreover, our result for A^o may turn out an overesti- 
mate, and real Nq may eventually be found even smaller 
than 200. Indeed, the leading source of inaccuracy in our 
results is due to the incomplete set of topological invari- 
ants. This can lead to errors of assigning the trivial knot 
status to some loops which are in fact not trivial knots. 
Such errors contaminate our trivial knot sets with non- 
trivial knots, leading to the overestimate of trivial knot 
probability, and this effect only increases with growing 
N, because at small N it is much less likely to meet a 
non-trivial knot confused with trivial one by our set of 
knot invariants. Thus, we conclude that the trivial knot 
probability for compact polymers goes as 

^compact, trivial ^ exp {-N/No) , Nq < 196 . (9) 

This result is essential for several reasons. We have 
shown in the section IlII Bl that the sub-chains inside the 
sufficiently big compact globule behave somewhat like 
Gaussian polymers, with R'^{£) proportional to £ despite 
the obvious presence of volume exclusion constraint. This 
fact, consistent with Flory theorem, leads to the tradi- 
tional understanding that the chains in the melt as well 
as sub-chains in the globule are Gaussian. From this, 
it would then be logical to assume that the trivial knot 
probability for them should also be the same as for corre- 
sponding Gaussian polymers, and not the same as for the 
swollen self- avoiding polymers. We remind that the triv- 
ial knot probability for Gaussian polymers, that is, for 
polygons of N segments with no volume exclusion, also 
follows the exponential law exp(— iV/A''o), with A'o vary- 
ing from about 350 for Gaussian random polygons (in 
which all segments have Gaussian distributed lengths) 
|4ll | to about 260 for regular polygons (made of length 
1 segments) (^^lE^. For the self-avoiding polymers, the 
value of A^o is even larger 113 ■ C)ur result now in- 
dicates that in regard to the knot forming ability of the 
polymer, chain compaction not only screens away the 
excluded volume, reducing A'o from its value for "thick" 
polymers to that for "thin" ones, but produces the much 
more dramatic effect, decreasing A'o significantly below 
its Gaussian value. In brief, compact polymers, although 
they satisfy Flory theorem, are not Gaussian for topolog- 
ical purposes, they are much (exponentially) more prone 
to forming knots. 

The Figure El displays the probabilities of some non- 
trivial knots in compact loops as the function of the loop 
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FIG. 11: Trivial knot probabilities for conformations of size 
L = 4 to L = 14. The thinner dotted line represents Mans- 
field's 43) fit to his data points. 
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FIG. 12; Probabilities of occurrence of a few knots. 



C. Statistics of segments and loops in trivial knots 

In this section, we want to address the following prob- 
lem. Consider a sub-chain of some length £ which is large, 
but much smaller (in a proper sense) than the entire glob- 
ule. Suppose further that the chain as a whole is closed, 
so it is a loop, and that this loop is a trivial knot. On the 
one hand, since £ <^ N, it seems that the sub-chain has no 
way to " know" what are the global topological properties 
of the entire loop. On the other hand, it is also obvious 
that the property of being a trivial knot is not a local but 
a global property of the loop. In some loose sense, we can 
say that since the entire loop has no knots, there is no 
way the sub-chain of the length £ N may have knots. 
Of course, to speak about knots in a sub-chain we should 
somehow decide how to close its ends; what we are saying 
here is that the sub-chain of an unknotted loop must not 
have knots under the majority of natural ways to connect 
its ends. This logic then seems to suggest that the sub- 
chain may tend to be swollen compared to its random 
walk size based on the analogy with loops in unre- 
stricted space in which trivial knots are known to swell 
[40 . 56, 58]. However attractive, this logic at least does 
not exhaust the problem, because if sub-chain sizes were 
to scale as £^ with n > 1/2, then these sub-chains would 
strongly overlap in the overall compact globule, making it 
difficult to avoid making knots between the sub-chains. 
All these inconclusive arguments are presented here in 
order to motivate the problem: how does the sub-chain 
size (say, end-to-end distance) scale with the sub-chain 
length if the sub-chain is buried deeply inside a collapse 
trivial knot? 



length. Similar to the studies made with non-compact 
chain models (see, e.g., ^3;0l)i the probabihty to ob- 
tain any particular knot starts from at small N, then 
reaches a maximum at some finite value of N, and then 
decreases and asymptotically approaches to with fur- 
ther growth of N. As in other cases, the qualitative ex- 
planation of this tendency is clear. When N is small, the 
loop might be too short to form a given knot. In fact, 
for the lattice model, it is clear that for every knot there 
is a finite value of N below which this knot cannot be 
formed at all, so its probability is exactly (for instance, 
the shortest loop capable of forming a non-trivial knot 
on the cubic lattice has = 24 segments). However, 
even for significantly larger A^ there might still be rela- 
tively few conformations to realize the given knot, and 
that yields low probability. At the other end, when A^ is 
exceedingly large, there are great many knots which can 
be comfortably formed, and their number keeps increas- 
ing with A^, yielding a decaying probability to locate the 
given knot. We should emphasize that the results pre- 
sented in Figure ^1 although qualitatively reasonable, 
have somewhat preliminary character, because our use 
of the restricted set of topological invariants at the very 
high crossing numbers may lead to inaccurate knot as- 
signments. 
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FIG. 13: Ratio of sub-chain mean-square end-to-end distance 
in trivial knots and in all loops versus number of links in the 
sub-chain. For the chain of the length N = , filling LxLxL 
cube, results were plotted up to L^. 

Measurements of mean-square end-to-end distance (de- 
fined similarly to Eq. JHl) were made on sub-chains (seg- 
ments) of compact chain conformations with trivial knots 
and on sub-chains of all conformations regardless of knot- 
type. The results (figure I13|l show that sub-chains of 
trivial knots are smaller or more compact compared to 
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FIG. 14: Ratio of number of sub-loops in trivial knots and in 
all loops, versus number of links in the sub-loop. Results for 
L = 12 and L — 14 were not plotted due to excessive 'noise'. 
This result complements figure IT!T1 



sub-chains of all knots. A similar result was also obtained 
for the gyration radius, which is another measure of size. 
(The extrema in the plots are, of course, effects of the 
finite size of the conformations.) 

Measurements of the number of sub-loops formed in 
each conformation were also made (Figure ^1 A loop is 
formed when monomers not connected by a link are next 
to each other in space). The result for the number of 
sub- loops in conformations with trivial knots, compared 
to the number of sub-loops in conformations regardless 
of knot-type, is in complete agreement with the previ- 
ous results: since sub-chains are more compact in overall 
trivial knots, they are more likely to form sub-loops. 

These results should be contrasted to the correspond- 
ing results for gyration radius of (entire) non-compact 
rings, which indicate that trivial knots in such rings are, 
on average, larger compared to all other knots 40, 56]. 
This is understood based on the argument that there 
are very compact conformations available for non-trivial 
knots are included in the average over all loops and 
are excluded from the average over trivial knots only. 
Clearly, for this ensemble of unrestricted loops, trivial 
knots remain swollen compared to the all-loops-average 
not only on the level of entire polymer, but also on the 
level of the sub-chains. In fact, this effect is expected to 
be scale-invariant at the length exceeding the characteris- 
tic knotting length Nq [l^ . Based on this comparison, we 
can conclude that it must be significantly more difficult 
to confine a trivial knot loop into a small volume than 
to realize a similar confinement of a phantom polymer, 
either a chain or a loop. Indeed, to compress a trivial 
knot one has to reduce its entropy by forcing all the sub- 
chains to shrink. This means, confinement entropy for 
the trivial knot is a volume effect, it scales as N in ther- 
modynamic limit. It must be compared with confinement 
entropy of usual polymers which only scales as N^/^ [2^ . 
This conclusion of the increased stiffness of trivial knots 
compared to other loops is consistent with the data of 



the work |56| on the probability distributions of the un- 
restricted loop sizes: with decreasing overall loop size, 
this probability decreases much sharper for trivial knots 
than for averaged loops. 

Although short of a proof, our results are consistent 
with the hypothesis of a " crumpled globule," which was 
formulated many years ago [59|, and which remains in 
the rank of hypothesis till today. 



CONCLUSION 



We formulated the new combinatorial algorithm for 
generation of Hamiltonian walks and cycles on the cu- 
bic lattices. This algorithm reduces biases compared to 
the previously known methods. The presented algorithm 
performs well on generation of the large compact self- 
avoiding walks. 

We employed the proposed generation algorithm to 
verify Flory theorem in its applicability to the random 
compact chains. We found that the statistics of the sub- 
chains inside the large globule approaches Gaussian, as 
predicted by Flory theorem, for sufficiently long poly- 
mers. Unexpectedly, this happens at rather large values 
of chain length N, about 10^. Although it is not entirely 
clear what is the most reasonable numerical correspon- 
dence between TV for the lattice toy model and the num- 
ber of residues a the real protein, it is safe to question 
the direct applicability of Gaussian statistics for the inte- 
rior of even large protein globules. On the other hand, it 
should be understood that the deviations from Gaussian 
statistics found for modest N compact chains are really 
small, and unless one is interested in sophisticated scal- 
ing analysis, they provide very reasonable qualitative fit 
to the data. 

Using knot invariants, we were able to identify the triv- 
ial knots and the first few knots in a sample of loop con- 
formations. We found that the probability of trivial knot 
in a compact conformation is significantly smaller than 
was previously believed, and that it is much smaller than 
for the corresponding Gaussian polymer. This suggests 
that there should be an abundance of knots in a random 
sample of compact conformations. We have also found 
that global restriction that the loop as a whole is a trivial 
knot has a dramatic statistical effect on the conforma- 
tions of all sub-chains, making them significantly more 
compact than for other loops. 

Our results suggest that low propensity of knots in real 
proteins might in fact be a statistically significant fact 
requiring an explanation, although it seems too early to 
speculate what this explanation might be, whether it is 
related to the physics of folding, or to some functional 
properties of proteins, or to some aspect of their evolu- 
tion. 
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APPENDIX A: IS THE NEW COMBINATORIAL 
ALGORITHM UNBIASED? 

The building of the Hamiltonian walk on the lattice 
with the help of some combinatorial algorithm can be 
viewed as the process of labeling the edges of the lat- 
tice according to some rules (as two matching, patching 
or other procedures). One of the rules is that none of 
the lattice nodes may have more then two labeled edges 
incident on it. There are different configurations of the 
labeled edges possible on the lattice. We now would like 
to consider the space of all the possible such configura- 
tions. Such space itself can be represented as a graph, 
in which every configuration of labeled edges is a vertex, 
and two vertices are connected if and only if the corre- 
sponding configurations differ only by the labels of one 
lattice edge. Such space includes configuration in which 
none of the edges is labeled. We call such a configuration 
root. The space can be divided into the following sub- 
spaces: 

i) configurations of labeled edges at which some of the 
lattice nodes do not have incoming labeled edges (dis- 
connected nodes); 

ii) configurations containing multiple sub-cycles and sub- 
chains, all the lattice nodes have two incident labeled 
edges except the ends of the sub-chains. No new lattice 
edge can be labeled. (Such configurations the algorithm 
used to start patching procedure); 

iii) Hamiltonian cycles. 

The configuration space is schematically shown in the 
Figure 9. As an illustration we display different configu- 
rations possible on the extended 2x2x2 lattice. 

An arbitrary combinatorial algorithm building a 
Hamiltonian walk starts from the root node of the config- 
uration space graph, then performs random walk along 
some path on the graph, and finishes its work at some 
node of subspace (iii). For the algorithm to be unbiased, 
the number of all possible paths leading to each node in 
the subspace (iii) should be equal. 

Let us consider the procedures of labeling random 
links, branching and patching of algorithm Pj. The ran- 
dom labeling of links and branching of sub-chains may 
lead either directly to the formation of the Hamiltonian 
cycle from subspace (iii) , or to the formation of some con- 
figuration from the subspace (ii). The latter situation is 
much more probable due to the size of the subspace (ii) 
is much larger than the size of (iii). Suppose the algo- 




FIG. 15: The space of possible configurations of links on the 
cubic lattice. Different subspaces and example configurations 
of links are shown. 



rithm generated some configuration from (ii). Now the 
patching procedure has to transform it to the single cy- 
cle. Even if one supposes that configurations from (ii) 
and (iii) are generated with equal probability, it appears 
that the number of paths leading from (ii) to different 
Hamiltonian cycles in (iii) is different. This can be easily 
seen from the enumeration of all possible ways to label 
the 2x2x2 lattice. The configurations 1 and 2 can be 
transformed to the Hamiltonian cycles 4 and 5, but there 
is no way to obtain the cycle 6 as a result of patching. 
Moreover, the number of paths to cycles 4 and 5 is also 
slightly different. In general, the probability to generate 
some Hamiltonian walk is proportional to the number of 
possible configurations of sub-cycles which can be trans- 
formed to this walk and to the number of ways to apply 
patching procedures to these configurations of sub-cycles. 
And this is the patching procedure that leads to the bi- 
ased sampling of Hamiltonian walk. Figure 1151 gives a 
simple example. 

Also it can be shown that the formation of the con- 
figuration with dead ends (similar to the configuration 
3 in Fig. [T5|) produces biased sampling of Hamiltonian 
walks too. The dead end forms if some vertex of the lat- 
tice which has only one incoming link has no unsaturated 
neighbors. 

The algorithm 1] can be corrected by avoiding, on all 
stages, placing a new link if it leads to either the clos- 
ing of a sub-cycle, or the formation of the dead ends. If 
the formation of the sub-cycles and the dead ends is for- 
bidden, then paths starting from the root configuration 
and ending in the subspace (iii) do not pass through the 
subspace (ii), and the patching is not applied. 

Undoubtedly, placing the links on the lattice in random 
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order docs not produce any biases. As for the branching 
of the sub-chains we are not so sure. However, in our 

I 



Input: 
Output: 



Begin; 

Color vertices of LG alternatively white and black; 
(if Case 1): Generate extended lattice graph EG; 
PerformRandomBipartiteMatchingO; 

End. 

Subroutines: 

PerforniRandomBipartiteMatchingQ : 
Begin; 

While(number of unsaturated vertices > 0) 
Choose random unsaturated vertex P; 
Choose random neighbor Q; 
if {Q unsaturated): 

TryLinkVertices{P, Q); 
else if {Q saturated): 

Choose direction along sub-chain, QS; 

Find end of sub-chain, T; 

TryGrowSubchain{T); 

Remove link QR; 

TryLinkVertices{P, Q); 
End if; 
End while; 

End. 



simulations we did not see any worrisome signs from this 
procedure. 



APPENDIX B: PSEUDOCODE 

A lattice graph LG(vertices V, edges E). 

Case 1: Hamiltonian cycle WE on the extended lattice graph; 
Case 2: (If LG is even):IIamiltonian cycle WL on LG. 



TryLinkVertices(P, Q): 
Begin; 

Draw link PQ] 

Find dead ends and cycles; 

if dead ends found, or (length of cycle < length of complete Hamiltonian walk): 
Remove link PQ; 

End. 

TryGrowSubchainiT) ; 
Begin; 

List unsaturated neighbors of T; 
While List is not empty: 

Choose random vertex X from List; 

TryLinkVertices{X, T); 

if link XT is drawn: 
End. 

else: 

Remove link X from List; 
End while; 

End. 
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